On signals faint and sparse: the ACICA algorithm for blind 
de-trending of Exoplanetary transits with low signal-to-noise 
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ABSTRACT 

Independent Component Analysis (ICA) has recently been shown to be a promising new 
path in data analysis and de-trending of exoplanetary time series signals. Such approaches 
do not require or assume any prior or auxiliary knowledge on the data or instrument in order 
to de-convolve the astrophysical light curve signal from instrument or stellar systematic noise. 
These methods are often known as 'blind source separation' (BSS) algorithms. Unfortunately 
all BSS methods suffer from a amplitude and sign ambiguity of their de-convolved components 
which severely limits these methods in low signal-to-noise (S /N) observations where their scalings 
cannot be determined otherwise. Here we present a novel approach to calibrate ICA using sparse 
wavelet calibrators. The Amplitude Calibrated Independent Component Analysis (ACICA) allows 
for the direct retrieval of the independent components' scalings and the robust de-trending of 
low S /N data. Such an approach gives us an unique and unprecedented insight in the underlying 
morphology of a data set, making this method a powerful tool for exoplanetary data de-trending 
and signal diagnostics. 



Subject headings: methods: data analysis 
niques: spectroscopic 

Introduction 



methods: statistical — techniques: photometric — tech- 



As we explore smaller and smaller extrasolar 
planet around ever fainter stars, it is unsurpris- 
ing that the need for ever more accurate data- 
calibration and de-trending techniques is a grow- 
ing one. In the recent past, there has been a no- 
table emergence of so called 'non-parametric' data 
de-trending algorithms in the fields of transiting 
extrasolar planet and time-resolved exoplanetary 



spectroscopy ( 


Carter & Winn 


2009; Thatte et al. 


2010 


Gibson et al. 


2012 


Waldmann 


2012 


Waldmann et al. 


2013 


). The use of such 'non- 



parametric' algorithms is a reactionary response 
to the difficulties of calibrating and de-trending 
time series observations when the instrument re- 
sponse function is not known at the precision of 
the science signal to be extracted. Previous, more 
conventional 'parametric' de-trending approaches 
rely on auxiliary information of the instrument 



(e.g. temperature sensor readings, telescope tilt, 
drifts in x and y positions of the science signal 
across the detector, etc.). Such methods have the 
disadvantage of being heavily reliant on the signal- 
to- noise (S /N) of the auxiliary information used to 
de-trend the science data, as well as suffering from 
a degree of arbitrariness in their definition of the 
instrument response function. 

In Waldmann | (2012) and Waldmann et al. 
(2013), we have demonstrated independent com- 
ponent analysis (ICA) as novel de-correlation 
strategy for exoplanetary time series. ICA be- 
longs to the class of blind-source separation (BSS) 
algorithms, which attempt to de-correlate an ob- 
served mixture of signals into its individual source 
components without prior knowledge of the origi- 
nal signals nor the way they were mixed together. 
Such an approach requires the least amount of 
information on a given system and hence ensures 
a high degree of objectivity in the de-trending of 



1 



data. 

1.1. Limitations of conventional ICA 

Whilst it has been shown that ICA is well suited 
to the de-correlation of non-Gaussian signals in 
simultaneously observed exoplanetary time series, 
it has two mayor limitations that will be addressed 
in this paper. These are: 

1 ) Susceptibility to Gaussian noise: to de- 
correlate non-Gaussian signals, ICA is inherently 
limited to a low degree of Gaussian white noise 
in the observed time series observations. This so 
far posed a significant limitation on the types of 
data that can be de-correlated. Waldm ann et al.l 
| (2013) showed that medium to high-SNR space 
based observations are somewhat permissible but 
noisier ground-based observations of exoplanetary 
time series are often out of reach for conventional 
ICA algorithms. 

2) Amplitude and Sign ambiguity: Like all blind 
source separation (BSS) algorithms, ICA can de- 
correlate signals up to an amplitude and sign am- 
biguity. As explained in section [L2| the algorithm 
attempts to simultaneously estimate the source 
signals, s, as well as their respective mixings (the 
mixing matrix), A that represent our observations 
x = A _1 s. Given both s and A are unknown, 
a scalar multiplication of either can be canceled 
by an equal division of the other. Hence no BSS 
algorithms attempt the retrieval of scalar ampli- 
tudes of the source signals s. [Waldmann et al. 



(2013) resolved this by iteratively fitting compo- 
nents of s onto observed out-of-transit data to re- 
trieve the lost scaling factors. Whilst this is a valid 
approach, it again limits us to high-SNR observa- 
tions as too much scatter in the observed time- 
series inhibits a good convergence of such a scaling 
factor regression. 

In this paper we will address both these limita- 
tions by defining ICA in orthogonal wavelet space. 
In the wavelet domain, as explained in later sec- 
tions, we can threshold Gaussian wavelet coeffi- 
cients and increase the signal's sparsity, making 
the ICA algorithm more robust in difficult S/N 
conditions. We can furthermore inject a sparse 
wavelet coefficient calibration signal that allows 
us to directly calibrate the amplitudes of the mix- 
ing matrix A without the need of any post-ICA 
scaling factor regression. 



A quick introduction to BSS and Wavelets 
are given in sections |1.2| & |1.3[ a description of 
the Wavelet-ICA and noise thresholding in sec- 
tion [2] Section |2.1| describes the amplitude cali- 
bration algorithm which is demonstrated in sec- 



tions [3TJ& 3.2 using simulations and Spitzer/IRS 
data respectively. 

1.2. Blind Source Separation 

Besides ICA, other BSS algorithms include 
principal component analysis (PC A), factor anal- 
ysis (FA), projection pursuit (PP), non-negative 
matrix factorisation (NMF), stationary subspace 
analysis (SSA), morphological component analysis 
(MCA) amongst others. For an extensive review 
of these algorithms we refer the interested reader 



to Comon & Jutten ( 


2010 


) as well as relevant ICA 


literature (Oja |1992 


Hyvarinen |1999 Hyvarinen 


& Oja 2000 Hyvar 


nen & Oja. 2001 Stone 


2004; Koldovsky et al. |2006 2005 Yeredor 2000 


Tichavsky et al. 2008 


). Whereas the underly- 



ing statistical assumptions differ significantly, all 
these algorithms take N simultaneously observed 
signals Xk(t), where k is the observed signal index, 
and de-correlate these into the source signals si (t) , 
where I is the source signal index. They all follow 
the functional form 



Xk(t) = Ofe,iSi(t) + a fei 2S 2 (£) + a k , 3 s 3 (t)+ (1) 
+ cifc,4S4(i) + ••■ + a k js N (t). 

where a^; are scaling factors. Assuming that the 
exoplanetary observation consists of a mixture of 
astrophysical signal, s a (t), instrument or stellar 
systematic noise, s sn (t), and the white noise sig- 
nal, s wn (t), from a Gaussian process wn(t), we can 
express equation [I] as sum of vectors (the time- 
dependance has been dropped for clarity): 



N 3n 

E 

1=3 



(2) 



where N sn is the number of systematic noise 
sources. Finally this can also be expressed as 
column vectors x = [sci(i), x^if), ■ ■ ■ ,Xk(t)] T , 
s = [si(f), S2(t), . . . , si(t)] T and the mixing matrix 
A, 



x = As. 



(3) 



2 



We can furthermore define the 'de-mixing matrix' 
W as the inverse of the 'mixing matrix' 



W = A 



(4) 



For a perfect de-correlation of the observed data 
x into its source components, the original mixing 
matrix is the perfect inverse of the estimated de- 
mixing matrix W and WA = I, where I is the 
identity matrix. 

ICA attempts to estimate both s and the de- 
mixing matrix W by assuming that all compo- 
nents of s are statistically independent of one an- 
other. This is achieved by iteratively maximis- 
ing the non-Gaussianity of each signal component 
by estimating their respective Shannon entropies 
{Shannon 1119481 iHyvarinen & Oja 1 120001). For 



more information we refer the reader to lWaldmannl 



[ (2012) and the standard literature. 



1.3. Introduction to Wavelets 

Readers familiar with wavelet decompositions 
may jump to section [2] 

Similar to a Fourier Transform (FT), the 
Wavelet Transform (WT) decomposes a given 
time series signal into its frequency components. 
Where the FT uses sine and cosine functions that 
extend over the full range of the data, the WT 
uses highly localised impulses. These impulses or 
'wavelets' scan through the time series and much 
like a tuning fork to an instrument, 'resonate' with 
localised features in the time series that are akin 
to the wavelet's shape and scaling. The individual 
wavelet basis functions are derived from a single 
mother wavelet ip(t) through translation and di- 



lation of the mother wavelet ( |Percival fc Walden 
2000). Different wavelets exist with different 



analytical properties, here we use the orthogo- 
nal basis wavelets of the Daubechies (db) family 
( Daubechies ||1988 |. The wavelet analogue to the 
Fourier transform of the times series x(t) is given 
by: 



x(t)tp T ^(t)dt 



(5) 



where ip Tj(p (t) is called the 'mother wavelet' for 
a given scaling tp and translation r and c is the 
wavelet coefficient with respect to r and ip. We de- 
fine the mother wavelet for the continuous wavelet 
transform (CWT) as 



V2 



(6) 



The wavelet base is orthogonal and we can 
hence reconstruct the data by taking the sum of 
the product of all coefficients for a given scale and 
translation, c Tj¥ ,, with the respectively scaled and 
translated mother wavelet 



x(t) = ^2 ^2c Ttip ip T ^(t) 



For a more in-depth definition of wavelets and 
their respective properties we refer the interested 
reader 



to 



Walden (2000) 



Daubechies 



( 


1992 


) and 


Percival & 



1.3.1. Multi-resolution analysis 

The above equations apply to the CWT case. 
The wavelet coefficients describe the correlation 
between the wavelet at varying scales (or frequen- 
cies). These can be calculated by changing the 
scale of the wavelet, i.e. the analysis window. We 
can hence speak of a multi-resolution decomposi- 
tion, where each scaling of the mother wavelet de- 
notes a given resolution. Here, the analogy to the 
Fourier Transform would be band-pass filters of 
varying size. Most times it is more sensible to ex- 
ploit the discrete nature of the data and to define 
the discrete wavelet transform (DWT). The DWT 
is significantly easier to implement and faster to 
compute. Similarly to the continuous case, in the 
DWT we have a 'mother' wavelet and a scaling 
function, also known as 'father' wavelet. Here, 
the 'mother' wavelet is denoted by h{t) and the 'fa- 



ther' by^JDaubech^ 

1 |2000| |Press et al. ||2007[ ). It is important to note 
that unlike in the CWT case, where the 'mother' 
wavelet itself is scaled to represent different fre- 
quencies in the data, this is not the case in the 
DWT. In the DWT, in analogy with the Fourier 
Transform, h{t) and g(t) can be thought of as high- 
pass and low-pass frequency filters respectively. 
Different scalings are then achieved by progres- 
sively 'down-sampling' the data. 

The DWT is best understood by following the 
individual steps of the algorithm that computes 
the transform: 
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1. The observed, discrete time series, x(t), is 
convolved with the 'mother' wavelet h(t): 



cD v (t) = (x * h)(t) = x(T)-h(t-r). 

T — — OQ 

(8) 

where [x * h) denotes the convolution of 
x with h and cD v represent the 'mother' 
wavelet coefficients for a given scale <p. As 
mentioned earlier, the 'mother' wavelet, h(t) 
acts as a high-pass filter, sensitive to the high 
frequencies or details of the time series. We 
hence refer to the coefficients of h(t) as detail 
coefficients. 



2. The next step is to convolve the same time 
series, x(t), with the scaling function or 'fa- 
ther' wavelet: 



cA^{t) = (x*g)(t) 



OO 

E 

T — — OC 



x(t) -g{t-T). 



(9) 

As opposed to the 'mother' wavelet, the 'fa- 
ther' wavelet acts as a low-pass filter of the 
time series and its coefficients can be viewed 
as a moving average of the underlying trend 
of x(t). We hence refer to its coefficients 
as average coefficients and denote them with 
cA v . Furthermore, the low-pass filter g(t) is 
related to the high-pass filter by 



g(L-l-t) = (-If ■ h(t) 



(10) 



where L is the filter length and corresponds 
to the number of points in the time series 
x(t). 

3. We now have two sets of time series, a 
low-pass filtered, moving-average time se- 
ries, cA Vl and a high-pass filtered time se- 
ries, cD v . We record the cD v coefficients 
and proceed with our analysis of the average 
coefficients, cA^. Since half of the frequen- 
cies in cA v (namely the high-pass ones) have 
been removed by equation |9j the Nyquist 
theorem tells us that we are oversampled by 
a factor of two. We can hence remove every 
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Fig. 1. — Outline of multi-resolution wavelet de- 
composition down to the 3rd decomposition level. 



second coefficient in cA v without losing in- 
formation. This operation is termed 'down- 
sampling' and abbreviated by \. 2, leaving us 
with N/2 coefficients to describe cA v . Sim- 
ilar applies to the detailed coefficients cD v . 
The detail and average coefficients are hence 
given by: 

cA <p (r) = J2x(t)-9(2T-t) (fl) 

t 

cD v (t) =^2x{t) ■ h(2r -t) (12) 

t 

4. The Nyquist down-sampling introduces the 
concept of scaling or multiple resolutions. 
If we now repeat steps 1-3 on the down- 
sampled average coefficients, cA lpi we obtain 
a new set of coefficients (cA v+ i and cD v+ i) 
on a scale that is double the size of the pre- 
vious decomposition. This is illustrated in 
figure [T] as flow chart and a data example is 
given in figure [2j 

For a given scale, (p, the data can now be recon- 
structed by reversing the above process: 



-At) = {cA^ir) ■ g{-t + 2t)) 

OO 

+ E E (cJVM"* + 2r)) 



(13) 



T — — OO 



where $ are the total number of decompositions. 

For an algorithmic implementation using quadra- 
ture mirror filters (QMFs) see appendix [A] and 
Press et al. I p007t). 
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Fig. 2. — Example of a discrete wavelet transform 
(DWT). TOP: sinusoidal time series with Gaus- 
sian noise and saw-tooth functions superimposed. 
BOTTOM: four level DWT decomposition of a 
noisy sinusoid using symlet-5 wavelets. It can be 
seen that the average coefficients, cA, represent 
a 'moving average' of the data, whereas the de- 
tailed coefficients, cD v , represent bands of higher 
frequencies. 

2. Wavelet ICA 

We now perform the DWT on each observed 
time series, Xk, and obtain a series of average co- 
efficients, cAk, and detail coefficients for a given 
scale, cD kiV . For our ICA decomposition, we use 
these series of coefficients instead of the time do- 
main time-series, Xk, and define our observed data 
as 



= Y,cMt)+y,Y. cD ^ t ) ( 14 ) 



if T 



and similarly we can express our source signals, Si, 
as the wavelet equivalent 



si 



(15) 



From here onwards we will use x to denote the 
wavelet domain presentation of the time-domain 
signal x. We hence restate equation [3] as 



As. 



(16) 



There are several important considerations to 
note here: 

As we are dealing with a multi-resolution analy- 
sis, it becomes possible to perform the blind source 
separation on individual scales (or bandpasses) 
only or to actively exclude some frequency ranges 
from the analysis. Such an approach may be ad- 
vantageous if one has prior knowledge of the sig- 
nals' frequency bandwidths and wishes to restrict 
the impact of high-frequency noise or other signals 
on the BSS of a given signal. An example of this 



is given by Lin & Zhang ( 2005 ) . In this paper, we 



do not take this approach for reasons described in 
section 12.11 



Transforming the observed data into wavelet 
basis sets increases the redundancy of the data. 
This is advantageous in the case of overcomplete 
ICA, where more source signals are present in the 
data than time series observed. This overcom- 
pleteness leads to an improper separation of the 
independent source signals in the data. Increasing 
the data's redundancy has been shown to alleviate 
this problem and makes it possible to efficiently 



use ICA in very restricted data ranges ( |Inuso et 
n|2007 



al 



Mammone et al. ||2012[ ) 



Furthermore as pointed out in section [TTTj ICA 
is inefficient in the presence of high frequency scat- 
ter. This is alleviated in the wavelet space as by 
the virtue of the multi-resolution analysis, most 
high-frequency scatter is contained in the cD\ co- 
efficients and a better degree of separation can be 
achieved for lower frequency systematics. We will 
demonstrate this property in section [3T2] It is also 
possible in wavelet space to selectively suppress 
Gaussian noise via soft or hard thresholding (e.g. 
Stein |1981||Donoho |1995| ) and hence increase the 
robustness of the BSS algorithm in low S/N con- 
ditions. 

2.1. Amplitude Calibrated ICA (ACICA) 

Arguably the central problem with an exoplan- 
etary data de-trending algorithm based on ICA is 
the scale and sign ambiguity of the de-correlated, 
independent components. 

Whilst we do not know the scaling of individ- 
ual source components in s, we know by defini- 
tion of the ICA pre-processing step (whitening) 
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that each source signal is normalised to unit vari- 
ance, E[sf] = 1, and that x = As. Hence the 
elements of A do not need to preserve the abso- 
lute but the relative scalings of s. If we know the 
absolute scaling of one source signal, we can hence 
solve the scaling degeneracy of all signals. This is 
usually possible for simulations where we control 
the inputs but not for real life examples where the 
source signals and their scalings are a priori un- 
known. One way to solve this predicament is via 
the introduction of a calibration signal (CS). Such 
a CS must have the following properties: 

1) Minimal to no impact on the data: The intro- 
duction of a CS should not distort any underlying 
signals or their amplitudes. 

2) Temporal localisation: The CS should not be 
located in in-transit (INT) regions and not take 
up too much out-of-transit (OOT) data. 

3) Stability to noise: The CS should not be af- 
fected by the noise of the data (be it Gaussian or 
otherwise) . 

4) Non-Gaussianity: The signal should be non- 
Gaussian enough for the ICA to recognise it but 
not more non-Gaussian than the science signal it- 
self. A too prominent CS could bias the ICA to- 
wards the retrieval of the CS and could impair the 
retrieval of other non-Gaussian signals. 

In the time domain, it is difficult to implement 
property 4 without violating property 1. In or- 
der to implement a distinct enough non-Gaussian 
signal in the time domain, one needs to signifi- 
cantly alter large sections of the data. Also the 
treatment of noise (property 3) is problematic. In 
the frequency domain, property 2 is violated as the 
Fourier transform does not contain temporal infor- 
mation and the CS would superimpose the science 
signal. However, all four criteria can be met in the 
wavelet domain. 

2.1.1. Injecting the Calibration Signal 

In the wavelet domain, we can introduce a much 
sparser CS than in the time domain and have con- 
trol over its temporal location (unlike in the fre- 
quency domain) via the lag term r. Given that 
x is more redundant than x, we can minimise the 
impact on the observed data or avoid any alter- 
ations to the data altogether by selecting wavelet 
coefficients of x with zero or near-zero amplitudes 
to be used for our CS. We can now re-define equa- 



tion [2] i 



JY 



Xk 



- bkS c = a k ys c + ^ 



a>k tsi 



(17) 



1=2 



where x% denotes the observed data with CS, s c is 
the CS with the same dimensions as s; and bj~ is 
a random but known scaling constant. We define 
Sr as 



cD v (t) = 1 



if t 



X) T cA(t) + J2p J2 T cD v ( t ) = otherwise 

(18) 

where are pre-selected lags for a given scale that 
correspond to a section of the OOT data. Note 
that: 

1) No average coefficients, cA, are used in s c 
since these are not redundant enough. 



2) As equation 17 suggests, only N number 
of independent components can be extracted. 
Adding the CS in the the observed data x may 
render the ICA overcomplete as one source sig- 
nal will not be retrieved in favour of the CS. For 
large data sets (N observations > N signa i s ) this is 
generally not a problem. 

3) We choose < b\ < |max|xfe| to avoid the 
CS to be the most dominant feature in the data. 



4) In equation 18 we have chosen to define s c 



to contain one non-zero coefficient per scale tp and 
lags corresponding to the same area of OOT data 
in the time-domain. This allows for efficient use 
of OOT data and a high sparsity in the wavelet 
domain but is entirely arbitrary otherwise. 

2.1.2. Retrieving the scaling information 

Having injected the calibration signal into our 
observations, we can now perform the ICA decon- 



volution (section 1.2 1 on the data with CS in the 
wavelet domain, x c . We identify the CS in the 
retrieved source signals and denote their respec- 
tive elements of the mixing matrix, A as a c k . By 
measuring the average amplitude of the wavelet 
coefficients comprising the retrieved CS, (s c ), and 
knowing the original CS amplitude, bf., 



(19) 
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we can retrieve the scaling of the CS as well as 
those of the other signals contained in s. We de- 
note this calibrated mixing matrix as O with its 
elements given by 



Ok i 



o-k.i 



bk 

(s c Y 



(20) 



2.1.3. Calibration error 



Using the above calibration approach we con- 
sider the total error on the independent compo- 
nents (ICs) to be a combination of source separa- 
tion error (SSE) of the individual IC and the SSE 
of the calibration signal components, s c . The SSE 
for the I th source signal can be estimated by 



E EiIl A+k G ll] 



k,l = 1,2,..., N. (21) 



where G is called the 'gain matrix' and defined 
as G = WA. For perfectly separated sources, 
we hence have G = WA = I where I is the iden- 
tity matrix. The ICA algorithms employed here 



(EFICA and WASOBI; 


Koldovsky et al. 


2006 


Yeredor 2000 


Tichavsky et al. 


2006b 


) can be 



shown to be asymptotically efficient and converge 
to the correct solution in the limit of Ni ter oo 
iterations. However in real world scenarios this is 
not the case and we find that the estimated mix- 
ing (or de-mixing) matrix is only approximately 
equal to the true underlying mixing matrix A, i.e. 
W ~ A -1 . To estimate the SSE, we can hence 
inspect the variance of the matrix G. Whilst 
it is possible to directly calculate G for simula- 
tions, the true mixing of the signals is usually un- 



known in real data applications. Tichavsky et al 
(|2006a|);|Koldovsky et al. I d2006b and I Tichavsky 



et al. (2008) have shown that an asymptotic esti- 
mate of G is nonetheless possible. For a derivation 
we refer the reader to the cited literature. 

We define the source separation error for the 
calibrated components in s c as the quadrature er- 
ror of the individual source components' error and 
the SSE of the calibration signal 



(22) 



3. Application Examples 
3.1. Simulations 

In this section we illustrate the ICA calibration 
approach using simulations. 

1) For this we generated five input signals: a 
Mandel & Agol (2002) secondary eclipse curve 



of HD189733b, a Gaussian process with FWHM 
of 2xl0 -3 amplitude, and three instrument noise 
vectors derived from HST/NICMOS observations 
( Waldmann et aL~|2013 |. These input signals are 
shown in figure |3| 

2) We mix those signals using a randomly gen- 
erated mixing matrix to obtain our 'observed' time 
series, x, figure [4] 

3) Using the MATLA^j] w avelet-toolbox and 



daubechies 4 (db4) wavelets (Daubechies 1988 



1992) we calculated the discrete wavelet trans- 
form for four scales ($ = 4) of each time series in 
x to obtain x (figure [5] black and blue curves) . In 
the scope of this simulation we found the choice 
of wavelet and number of vanishing moments not 
to matter greatly. However the choice of wavelet 
and decomposition depth may vary from data set 
to data set. 

4) We generated the calibration signal, s c , to 
consist of one wavelet coefficient per scale <p giving 
$ total number of non-zero coefficients. For each 
series, Xk, the calibration signal was multiplied 
with the random scaling factor bk, i.e. s c _k — bkS c . 
Figure [5] (red peaks) shows the scaled calibration 
signal for each x^. Note that we chose the lags 
of the non-zero coefficient to correspond to the 
95*^ percentile of each scale. This guarantees the 
CS to be located in the post-eclipse out-of-transit 
data. It also overlaps the differently scaled wavelet 
impulses and hence minimises the impact on the 
time-domain data representation. See figure [6] for 
a time-domain representation of x c k=7i (third from 
top in figure [5| . 

5) The blind source separation via independent 
component analysis as described in Waldmann 
(2012) was performed and the time-domain repre- 



sentation of the retrieved ICs are shown in figure[7] 
6) Following section 2.1.2 we calculate the 
scaled mixing matrix O and apply the scalings 
to individual source signals. Figure [8] shows the 



J http: / /www. mathworks.co.uk 
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observed data z£ =3 from figure [6j blue circles, and 
the re-constructed data using the scaling matrix 
Ok=3,i, green crosses. The scaled ICs are shown 
offset underneath. 

3.2. Spitzer/IRS: HD189733b secondary 
eclipse 

We test the proposed algorithm on Spitzer/IRS 



(Houck et al. 2004) observations of a secondary 
eclipse of HD189733b. These observations were 
obtained in November 2006 (program id: 30473) in 
low resolution mode ranging from 7.46 to 14.29/im. 
The secondary eclipse was followed for 5:48 hours 
with integration times of 14.7 seconds per ramp. 
The Spitzer pipeline calibrated data were reduced 
using the Spicd^] spectral reduction software. Ex- 
amples of the resulting time series are shown in 
figur es [9] & [To] (blue crosses). Similar to sec- 
tion |3.iy we take these time series as inputs to 
our algorithm. Figure 11 shows the DWT, us- 
ing db4 wavelets, of the time series at 7.6971/im 
(black lines) with the injected calibration signal 
over plotted in red. Note that no binning in wave- 
length was performed before, which marks a sig- 
ni ficant difference to the HST /NICMOS analysis 
in Waldmann et al. (2013), where a relatively 



coarse binning was necessary to reduce the Gaus- 
sian noise. 

We now follow through each individual step as 
described in the previous sections and obtain the 
scaled independent components of the data set. 
Figures [9] fc [l0| show two observed time series (blue 
crosses) with the correctly scaled individual in- 
dependent components (black dots) underneath. 
The first and second ICs comprise the secondary 
eclipse signal and the calibration signal respec- 
tively. These are also shown in figure [12] The re- 
maining ICs are instrument or stellar systematic 
noise. The amplitudes of these systematic com- 
ponents can be seen to be lower toward the blue 
end of the spectrum (figure [9]), which is also evi- 
dent in a cleaner observed time series, and more 
pronounced toward the red end of the spectrum 



(figure 10). In figure 12 we over plotted the best 



fit Mandel & Agol (2002) model using a MCMC 



fitting routine (Waldmann et al. 20131 with the 
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Fig. 3.- 
bottom: 

HST/NICMOS instrument (|Waldmann et al. 



2013); a Mandel fc Agol (2002) lightcurve of the 



secondary eclipse of HD189733b; Gaussian noise 
at with FWHM of 2xl0~ 3 normalised flux. 




transit depth as only free parameter. The transit 



0.47 0.48 0.49 0.5 0.51 0.52 0.53 
Orbital Phase 

Fig. 4. — Signals from figure [3] mixed together, 
using a random mixing matrix, to created the ob- 
served signals x. 



version 2.5. http://irsa.ipac.caltech.edu/data 



/SPITZER/docs/dataanalysistools/tools/spice/ 
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Fig. 5.— BLUE AND BLACK: Wavelet trans- 
form of signals in figure [4j the order corresponds 
to order in figure [| (colours for clarity). GREEN 
dotted: denoting scale boundaries from average 
coefficients cA to the highest frequency scale cD\ . 
RED: Calibration signal for each series x over- 
plotted. Four peaks are visible (one per scale) 
close to their scale boundaries. Their amplitudes 
are defined by the random scaling vector b. 
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Fig. 7. — Time domain representation of the in- 
dependent components, s, retrieved. First compo- 
nents is the secondary eclipse signal, second com- 
ponents the calibration signal. Note that these are 
not scaled yet. 
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Fig. 6. — BLACK: Time domain representation of 
third observed component xi =3 after the calibra- 



tion signal was added (in figure [5]). RED: Calibra- 
tion signal only. Note the temporal location of the 
four frequency signal (red spikes in figure [5]) at the 
far edge of the post-egress out-of-transit data. 



Fig. 8. — TOP: Third observed time series (££ =3 , 
figure [6j blue circles) , over-plotted (green crosses) 
reconstructed signal using scaled components, de- 
scribed in section [2. f .2! Note the excellent match 
between observed and reconstructed time series. 
BOTTOM black: scaled independent components 
of the above signal. 
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depth posterior distribution is shown in figure 13 
The total error is the quadrature sum of of =1 and 
the MCMC derived standard error a mc . We ob- 
tain at 7.6971/Ltm a planet/star contrast ratio of 
Fp/Fs = 0.00415 ± 0.00015 which we find con- 
sistent with the measurement by |Grillmair et al. 
| (2007). Figure 12 also shows the retrieved cali- 
bration signal (black dots) with the original cali- 
bration signal over plotted in red. The excellent 
match between both indicates an adequate signal 
separation and the correct scaling of the indepen- 
dent components. 

In order to demonstrate the increased efficiency 
of the proposed ACICA algorithm in the presence 
of noise, we also show the ICs derived by perform- 
ing the more 'traditional' ICA in the time domain 
only (figure 14 1. The components in figure 



14 



are 

poorly separated and the standard ICA analysis 
did not converged with traces of the secondary 
eclipse feature present in three separate compo- 
nents. 

4. Discussion 

Using ACICA, we can directly retrieve the sig- 
nal amplitudes of the source components com- 
prising our data. It not only allows us to non- 
parametrically de-trend low S /N data sets but also 
allows for a unique insight into the structural make 
up of the observations. A study of the systematic 
components and their amplitudes offers a power- 
ful diagnostic on systematic noise behaviour across 
a detector. As shown in the Spitzcr/IRS exam- 
ple, amplitudes of systematic components in the 
data can vary significantly across the dispersion 
of a grism. A study of these systematic compo- 
nents may allow, amongst others, for an optimi- 
sation of residual flat-fielding errors across a field 
or the characterisation of wavelength dependent 
slit-losses of the instrument. As data diagnostic 
we can hence define the S/N of a time series k in 
terms of its systematic noise 



SNR s k ys 



AF, 



(23) 



where <J 2 {s c sn i fc ) is the variance of a given system- 
atic noise component at for the k th time series and 
AFk is the respective measured transit depth. 
The choice of wavelet family can be an impor- 
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Fig. 9.— Spitzer/IRS observations of HD189733b 
at 7.6971^m. BLUE CROSSES: Raw time series. 
BLACK DOTS: first 11 most non-Gaussian inde- 
pendent components comprising the raw time se- 
ries. The first and second components are the sec- 
ondary eclipse curve and the calibration signal re- 
spectively. Remaining components are instrument 
or stellar systematics. 
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Fig. 10. — Spitzer/IRS observations of 

HD189733b at 11.6285/zm. BLUE CROSSES: 
Raw time series. BLACK DOTS: individual 
independent components in the same order as in 
figure [9j Note the significantly higher systematic 
noise amplitudes at the red-end of the spectrum 
compared to the blue end in figure [9] 
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Fig. 11. — BLACK: Wavelet transform presenta- 
tion of time series in figure [9] at 7.6971/im. RED: 
Calibration signal injected in the detailed coeffi- 
cients. 
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Fig. 13. — MCMC generated posterior distribu- 
tion of transit depth parameter for lightcurve fit 
in figure |12| Distribution mean and one sigma 
bounds are indicated with red continuous and dis- 
continuous lines respectively. 
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Fig. 12. — TOP: secondary eclipse component 
at 7.6971/im (first independent component in fig- 
ure [9) with |Mandel fc Agol | < |2002[ > model over 
plotted (red). BOTTOM: retrieved calibration 
signal (black dots) with original input calibration 
signal over plotted (red line). 
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Fig. 14. — First five (unsealed) independent com- 
ponents of ICA analysis performed in the time do- 
main. Given the low signal to noise of the data the 
convergence of the ICA algorithm is limited and it 
cannot separate individual sources when directly 
applied in the time domain. Compare this to fig- 
ure [9] where ICA was performed on the wavelet 
coefficients and the source signals were separated 
correctly. 
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tant factor in the convergence properties of the 
code. For noisy data we find a db4 wavelet base 
to be the best compromise between slowly vary- 
ing continues functions and step-like, discrete off- 
sets in the data. Should the data's systematics be 
dominated by offset patterns, such as those often 



induced by nodding (e.g. Grillmair et al. 20071, 



we find a Haar wavelet to give a sparser repre- 
sentation of the data than the Daubechies family. 
Similarly, high order Daubechies wavelets or sym- 
metric wavelets (Symlets) are best suited if one's 
data is defined by continuous evolving trends. 

As mentioned in section [2j Gaussian noise can 
be suppressed in wavelet space by the selective 
thresholding of detailed coefficients. Whilst true 
in theory, we found it difficult to implement in 
practice and great care needs to be taken in the 
choice of thresholding rule and amplitude to avoid 
'over cleaning' and distorting of the underlying 
data. Fortunately we find that convergence prop- 
erties of the ICA de-convolution are much en- 
hanced by the transformation of the data into 
wavelet space alone and further cleaning was (in 
the scope of this analysis) unnecessary. 

5. Conclusion 

In this paper we present a novel approach to 
the amplitude calibration of independent compo- 
nent analysis (the ACICA algorithm) via the intro- 
duction of a sparse calibration signal in orthogonal 
wavelet space. By transforming observed time se- 
ries into series of wavelet coefficients, we are able 
to overcome two main limitations of blind source 
separation algorithms such as ICA: 1) the conver- 
gence of ICA algorithms is strongly impaired by 
the presence of Gaussian noise in the data. By 
transforming data to a sparser and more redun- 
dant basis we can significantly improve the perfor- 
mance of the ICA de-convolution without other- 
wise altering the data. 2) In the wavelet domain it 
is possible to inject an artificial calibration signal 
of known amplitude into the data without signifi- 
cant or any impact to the original data. With the 
use of such a calibration signal we can directly de- 
termine the individual absolute amplitudes of the 
derived independent components. This marks a 
significant improvement over methods such as they 



are discussed in Waldmann et al. (2013) where 



component amplitudes were derived through a re- 



gression analysis to existing out of transit baseline 
data. 

ACICA allows us to de-trend otherwise in- 
accessible data sets non-parametrically. We 
demonstrated this using simulations and archival 
Spitzer/IRS data. It furthermore offers us an un- 
precedented and unique insight into the morphol- 
ogy of a data set by allowing us to directly map 
out temporal/wavelength dependent variations of 
instrumental or stellar noise in the data set. 

Together, these attributes make the algorithm 
proposed here a highly versatile and powerful tool 
for exoplanetary time series analysis. 
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A. The quadrature mirror filter 



A very fast and simple implementation of the DWT for multi-resolution decomposition is by construct- 
ing the quadrature mirror transformation matrix. For a Daubechies-4 wavelet we obtain four coefficients 
comprising h(t) and similarly g(t). Rather than convolving the time series, x(t) with both filters separately 
and then down-sample, we can also construct a matrix where each odd row contains h(t) and each even row 
contains g(t) coefficients. This automatically down-samples the data to the new resolution ip + 1. Such a 
matrix is called a 'quadrature mirror filter' (QMF) and equation Al is an example of such (Press et al. |2007) 
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where the empty spaces denote zeros. To obtain a DWT using this QMF, we multiply the QMF with the 
column vector containing the time-series on the right. Note the circular behaviour of the matrix at the 
bottom, where the wavelet coefficients wrap around to the beginning. This has important consequences as it 
indicates that the DWT wraps around the data and the transform at the end of the time-series is sensitive 
to data at the beginning of the time series. This effect can be avoided by adding sufficient zero- valued points 
to the time series at its beginning and end. This process is also known as 'zero-padding'. 
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